Tags: llama cpp*

0 bookmark(s) - Sort by: Date ↓ / Title /

Serving Large models (part one): VLLM, LLAMA CPP Server, and SGLang

This guide delves into three prominent projects for serving large language models and vision-language models: VLLM, LLAMA CPP Server, and SGLang. Each project offers distinct functionalities and is explained with usage instructions, features, and deployment methods.

2024-09-30 Tags: vllm, llama cpp, llm by klotz

Llama.Cpp

obtain the original LLaMA model weights and place them in ./models

ls ./models 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

install Python dependencies

python3 -m pip install -r requirements.txt

convert the 7B model to ggml FP16 format

python3 convert.py models/7B/

quantize the model to 4-bits (using q4_0 method)

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

run the inference

./main -m ./models/7B/ggml-model-q4_0.bin -n 128

2023-06-05 Tags: github, llama, llama cpp, llm, self-hosted by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me